11 research outputs found

    Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

    Get PDF
    In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. More precisely, we interpret a large class of stochastic optimization methods as procedures that iteratively minimize a surrogate of the objective, which covers the stochastic gradient descent method and variants of the incremental approaches SAGA, SVRG, and MISO/Finito/SDCA. This point of view has several advantages: (i) we provide a simple generic proof of convergence for all of the aforementioned methods; (ii) we naturally obtain new algorithms with the same guarantees; (iii) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we propose a new accelerated stochastic gradient descent algorithm and an accelerated SVRG algorithm with optimal complexity that is robust to stochastic noise.Comment: Journal of Machine Learning Research, Microtome Publishing, In pres

    Sparse recovery by reduced variance stochastic approximation

    Full text link
    In this paper, we discuss application of iterative Stochastic Optimization routines to the problem of sparse signal recovery from noisy observation. Using Stochastic Mirror Descent algorithm as a building block, we develop a multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective. An interesting feature of the proposed algorithm is its linear convergence of the approximate solution during the preliminary phase of the routine when the component of stochastic error in the gradient observation which is due to bad initial approximation of the optimal solution is larger than the "ideal" asymptotic error component owing to observation noise "at the optimal solution." We also show how one can straightforwardly enhance reliability of the corresponding solution by using Median-of-Means like techniques. We illustrate the performance of the proposed algorithms in application to classical problems of recovery of sparse and low rank signals in linear regression framework. We show, under rather weak assumption on the regressor and noise distributions, how they lead to parameter estimates which obey (up to factors which are logarithmic in problem dimension and confidence level) the best known to us accuracy bounds

    Stochastic optimization for large-scale machine learning : variance reduction and acceleration

    No full text
    Cette thèse vise à explorer divers sujets liés à l'analyse des méthodes de premier ordre appliquées à des problèmes stochastiques de grande dimension. Notre première contribution porte sur divers algorithmes incrémentaux, tels que SVRG, SAGA, MISO, SDCA, qui ont été analysés de manière approfondie pour les problèmes avec des informations de gradient exactes. Nous proposons une nouvelle technique, qui permet de traiter ces méthodes de manière unifiée et de démontrer leur robustesse à des perturbations stochastiques lors de l'observation des gradients. Notre approche est basée sur une extension du concept de suite d'estimation introduite par Yurii Nesterov pour l'analyse d'algorithmes déterministes accélérés.Cette approche permet de concevoir de façon naturelle de nouveaux algorithmes incrémentaux offrant les mêmes garanties que les méthodes existantes tout en étant robustes aux perturbations stochastiques.Enfin, nous proposons un nouvel algorithme de descente de gradient stochastique accéléré et un nouvel algorithme SVRG accéléré robuste au bruit stochastique. Dans le dernier cas il s'agit essentiellement de l'accélération déterministe au sens de Nesterov, qui préserve la convergence optimale des erreurs stochastiques.Finalement, nous abordons le problème de l'accélération générique. Pour cela, nous étendons l'approche multi-étapes de Catalyst, qui visait à l'origine l'accélération de méthodes déterministes. Afin de l'appliquer aux problèmes stochastiques, nous le modifions pour le rendre plus flexible par rapport au choix des fonctions auxiliaires minimisées à chaque étape de l'algorithme. Finalement, à partir d'une méthode d'optimisation pour les problèmes fortement convexes, avec des garanties standard de convergence, notre procédure commence par accélérer la convergence vers une région dominée par le bruit, pour converger avec une vitesse quasi-optimale ensuite. Cette approche nous permet d'accélérer diverses méthodes stochastiques, y compris les algorithmes à variance réduite. Là encore, le cadre développé présente des similitudes avec l'analyse d'algorithmes accélérés à l'aide des suites d'estimation. En ce sens, nous essayons de combler l'écart entre l'optimisation déterministe et stochastique en termes d'accélération de Nesterov. Une autre contribution est une analyse unifiée d'algorithmes proximaux stochastiques lorsque l'opérateur proximal ne peut pas être calculé de façon exacte.Ensuite, nous étudions des propriétés d'algorithmes stochastique non-Euclidiens appliqués au problème d'estimation parcimonieuse. La structure de parcimonie permet de réduire de façon significative les effets du bruit dans les observation du gradient. Nous proposons un nouvel algorithme stochastique, appelé SMD-SR, permettant de faire meilleur usage de cette structure. Là encore, la méthode en question est une routine multi-étapes qui utilise l'algorithme stochastique de descente en miroir comme élément constitutif de ses étapes. Cette procédure comporte deux phases de convergence, dont la convergence linéaire de l'erreur pendant la phase préliminaire, et la convergence à la vitesse asymptotique optimale pendant la phase asymptotique. Par rapport aux solutions existantes les plus efficaces aux problèmes d’optimisation stochastique parcimonieux, nous proposons une amélioration sur plusieurs aspects. Tout d'abord, nous montrons que l'algorithme proposé réduit l'erreur initiale avec une vitesse linéaire (comme un algorithme déterministe de descente de gradient, utilisant l'observation complète du gradient), avec un taux de convergence optimal par rapport aux caractéristiques du bruit. Deuxièmement, nous obtenons ce taux pour une grande classe de modèles de bruit, y compris les distributions sous-gaussiennes, de Rademacher, de Student multivariées, etc. Enfin, ces résultats sont obtenus sous la condition optimale sur le niveau de parcimonie qui peut approcher le nombre total d'iterations de l'algorithme (à un facteur logarithmique près).A goal of this thesis is to explore several topics in optimization for high-dimensional stochastic problems. The first task is related to various incremental approaches, which rely on exact gradient information, such as SVRG, SAGA, MISO, SDCA. While the minimization of large limit sums of functions was thoroughly analyzed, we suggest in Chapter 2 a new technique, which allows to consider all these methods in a generic fashion and demonstrate their robustness to possible stochastic perturbations in the gradient information.Our technique is based on extending the concept of estimate sequence introduced originally by Yu. Nesterov in order to accelerate deterministic algorithms.Using the finite-sum structure of the problems, we are able to modify the aforementioned algorithms to take into account stochastic perturbations. At the same time, the framework allows to derive naturally new algorithms with the same guarantees as existing incremental methods. Finally, we propose a new accelerated stochastic gradient descent algorithm and a new accelerated SVRG algorithm that is robust to stochastic noise. This acceleration essentially performs the typical deterministic acceleration in the sense of Nesterov, while preserving the optimal variance convergence.Next, we address the problem of generic acceleration in stochastic optimization. For this task, we generalize in Chapter 3 the multi-stage approach called Catalyst, which was originally aimed to accelerate deterministic methods. In order to apply it to stochastic problems, we improve its flexibility on the choice of surrogate functions minimized at each stage. Finally, given an optimization method with mild convergence guarantees for strongly convex problems, our developed multi-stage procedure, accelerates convergence to a noise-dominated region, and then achieves the optimal (up to a logarithmic factor) worst-case convergence depending on the noise variance of the gradients. Thus, we successfully address the acceleration of various stochastic methods, including the variance-reduced approaches considered and generalized in Chapter 2. Again, the developed framework bears similarities with the acceleration performed by Yu. Nesterov using the estimate sequences. In this sense, we try to fill the gap between deterministic and stochastic optimization in terms of Nesterov's acceleration. A side contribution of this chapter is a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.In Chapter 4, we study properties of non-Euclidean stochastic algorithms applied to the problem of sparse signal recovery. A sparse structure significantly reduces the effects of noise in gradient observations. We propose a new stochastic algorithm, called SMD-SR, allowing to make better use of this structure. This method is a multi-step procedure which uses the stochastic mirror descent algorithm as a building block over its stages. Essentially, SMD-SR has two phases of convergence with the linear bias convergence during the preliminary phase and the optimal asymptotic rate during the asymptotic phase.Comparing to the most effective existing solution to the sparse stochastic optimization problems, we offer an improvement in several aspects. First, we establish the linear bias convergence (similar to the one of the deterministic gradient descent algorithm, when the full gradient observation is available), while showing the optimal robustness to noise. Second, we achieve this rate for a large class of noise models, including sub-Gaussian, Rademacher, multivariate Student distributions and scale mixtures. Finally, these results are obtained under the optimal condition on the level of sparsity which can approach the total number of iterations of the algorithm (up to a logarithmic factor)

    Optimisation stochastique pour l'apprentissage machine à grande échelle : réduction de la variance et accélération

    No full text
    A goal of this thesis is to explore several topics in optimization for high-dimensional stochastic problems. The first task is related to various incremental approaches, which rely on exact gradient information, such as SVRG, SAGA, MISO, SDCA. While the minimization of large limit sums of functions was thoroughly analyzed, we suggest in Chapter 2 a new technique, which allows to consider all these methods in a generic fashion and demonstrate their robustness to possible stochastic perturbations in the gradient information.Our technique is based on extending the concept of estimate sequence introduced originally by Yu. Nesterov in order to accelerate deterministic algorithms.Using the finite-sum structure of the problems, we are able to modify the aforementioned algorithms to take into account stochastic perturbations. At the same time, the framework allows to derive naturally new algorithms with the same guarantees as existing incremental methods. Finally, we propose a new accelerated stochastic gradient descent algorithm and a new accelerated SVRG algorithm that is robust to stochastic noise. This acceleration essentially performs the typical deterministic acceleration in the sense of Nesterov, while preserving the optimal variance convergence.Next, we address the problem of generic acceleration in stochastic optimization. For this task, we generalize in Chapter 3 the multi-stage approach called Catalyst, which was originally aimed to accelerate deterministic methods. In order to apply it to stochastic problems, we improve its flexibility on the choice of surrogate functions minimized at each stage. Finally, given an optimization method with mild convergence guarantees for strongly convex problems, our developed multi-stage procedure, accelerates convergence to a noise-dominated region, and then achieves the optimal (up to a logarithmic factor) worst-case convergence depending on the noise variance of the gradients. Thus, we successfully address the acceleration of various stochastic methods, including the variance-reduced approaches considered and generalized in Chapter 2. Again, the developed framework bears similarities with the acceleration performed by Yu. Nesterov using the estimate sequences. In this sense, we try to fill the gap between deterministic and stochastic optimization in terms of Nesterov's acceleration. A side contribution of this chapter is a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.In Chapter 4, we study properties of non-Euclidean stochastic algorithms applied to the problem of sparse signal recovery. A sparse structure significantly reduces the effects of noise in gradient observations. We propose a new stochastic algorithm, called SMD-SR, allowing to make better use of this structure. This method is a multi-step procedure which uses the stochastic mirror descent algorithm as a building block over its stages. Essentially, SMD-SR has two phases of convergence with the linear bias convergence during the preliminary phase and the optimal asymptotic rate during the asymptotic phase.Comparing to the most effective existing solution to the sparse stochastic optimization problems, we offer an improvement in several aspects. First, we establish the linear bias convergence (similar to the one of the deterministic gradient descent algorithm, when the full gradient observation is available), while showing the optimal robustness to noise. Second, we achieve this rate for a large class of noise models, including sub-Gaussian, Rademacher, multivariate Student distributions and scale mixtures. Finally, these results are obtained under the optimal condition on the level of sparsity which can approach the total number of iterations of the algorithm (up to a logarithmic factor).Cette thèse vise à explorer divers sujets liés à l'analyse des méthodes de premier ordre appliquées à des problèmes stochastiques de grande dimension. Notre première contribution porte sur divers algorithmes incrémentaux, tels que SVRG, SAGA, MISO, SDCA, qui ont été analysés de manière approfondie pour les problèmes avec des informations de gradient exactes. Nous proposons une nouvelle technique, qui permet de traiter ces méthodes de manière unifiée et de démontrer leur robustesse à des perturbations stochastiques lors de l'observation des gradients. Notre approche est basée sur une extension du concept de suite d'estimation introduite par Yurii Nesterov pour l'analyse d'algorithmes déterministes accélérés.Cette approche permet de concevoir de façon naturelle de nouveaux algorithmes incrémentaux offrant les mêmes garanties que les méthodes existantes tout en étant robustes aux perturbations stochastiques.Enfin, nous proposons un nouvel algorithme de descente de gradient stochastique accéléré et un nouvel algorithme SVRG accéléré robuste au bruit stochastique. Dans le dernier cas il s'agit essentiellement de l'accélération déterministe au sens de Nesterov, qui préserve la convergence optimale des erreurs stochastiques.Finalement, nous abordons le problème de l'accélération générique. Pour cela, nous étendons l'approche multi-étapes de Catalyst, qui visait à l'origine l'accélération de méthodes déterministes. Afin de l'appliquer aux problèmes stochastiques, nous le modifions pour le rendre plus flexible par rapport au choix des fonctions auxiliaires minimisées à chaque étape de l'algorithme. Finalement, à partir d'une méthode d'optimisation pour les problèmes fortement convexes, avec des garanties standard de convergence, notre procédure commence par accélérer la convergence vers une région dominée par le bruit, pour converger avec une vitesse quasi-optimale ensuite. Cette approche nous permet d'accélérer diverses méthodes stochastiques, y compris les algorithmes à variance réduite. Là encore, le cadre développé présente des similitudes avec l'analyse d'algorithmes accélérés à l'aide des suites d'estimation. En ce sens, nous essayons de combler l'écart entre l'optimisation déterministe et stochastique en termes d'accélération de Nesterov. Une autre contribution est une analyse unifiée d'algorithmes proximaux stochastiques lorsque l'opérateur proximal ne peut pas être calculé de façon exacte.Ensuite, nous étudions des propriétés d'algorithmes stochastique non-Euclidiens appliqués au problème d'estimation parcimonieuse. La structure de parcimonie permet de réduire de façon significative les effets du bruit dans les observation du gradient. Nous proposons un nouvel algorithme stochastique, appelé SMD-SR, permettant de faire meilleur usage de cette structure. Là encore, la méthode en question est une routine multi-étapes qui utilise l'algorithme stochastique de descente en miroir comme élément constitutif de ses étapes. Cette procédure comporte deux phases de convergence, dont la convergence linéaire de l'erreur pendant la phase préliminaire, et la convergence à la vitesse asymptotique optimale pendant la phase asymptotique. Par rapport aux solutions existantes les plus efficaces aux problèmes d’optimisation stochastique parcimonieux, nous proposons une amélioration sur plusieurs aspects. Tout d'abord, nous montrons que l'algorithme proposé réduit l'erreur initiale avec une vitesse linéaire (comme un algorithme déterministe de descente de gradient, utilisant l'observation complète du gradient), avec un taux de convergence optimal par rapport aux caractéristiques du bruit. Deuxièmement, nous obtenons ce taux pour une grande classe de modèles de bruit, y compris les distributions sous-gaussiennes, de Rademacher, de Student multivariées, etc. Enfin, ces résultats sont obtenus sous la condition optimale sur le niveau de parcimonie qui peut approcher le nombre total d'iterations de l'algorithme (à un facteur logarithmique près)

    Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

    No full text
    short version of preprint arXiv:1901.08788International audienceIn this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov

    Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

    No full text
    short version of preprint arXiv:1901.08788International audienceIn this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov

    民事訴訟法之研討(四)

    Get PDF
    International audienceIn this paper, we introduce various mechanisms to obtain accelerated first-order stochastic optimization algorithms when the objective function is convex or strongly convex. Specifically, we extend the Catalyst approach originally designed for deterministic objectives to the stochastic setting. Given an optimization method with mild convergence guarantees for strongly convex problems, the challenge is to accelerate convergence to a noise-dominated region, and then achieve convergence with an optimal worst-case complexity depending on the noise variance of the gradients. A side contribution of our work is also a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed

    Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

    No full text
    short version of preprint arXiv:1901.08788International audienceIn this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov

    Sparse recovery by reduced variance stochastic approximation

    No full text
    In this paper, we discuss application of iterative Stochastic Optimization routines to the problem of sparse signal recovery from noisy observation. Using Stochastic Mirror Descent algorithm as a building block, we develop a multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective. An interesting feature of the proposed algorithm is its linear convergence of the approximate solution during the preliminary phase of the routine when the component of stochastic error in the gradient observation which is due to bad initial approximation of the optimal solution is larger than the "ideal" asymptotic error component owing to observation noise "at the optimal solution." We also show how one can straightforwardly enhance reliability of the corresponding solution by using Median-of-Means like techniques. We illustrate the performance of the proposed algorithms in application to classical problems of recovery of sparse and low rank signals in linear regression framework. We show, under rather weak assumption on the regressor and noise distributions, how they lead to parameter estimates which obey (up to factors which are logarithmic in problem dimension and confidence level) the best known to us accuracy bounds
    corecore